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Abstract 

Designing a covariance function that represents the underlying correlation is a cru¬ 
cial step in modeling complex natural systems, such as climate models. Geospatial 
datasets at a global scale usually suffer from non-stationarity and non-uniformly 
smooth spatial boundaries. A Gaussian process regression using a non-stationary 
covariance function has shown promise for this task, as this covariance function 
adapts to the variable correlation structure of the underlying distribution. In this 
paper, we generalize the non-stationary covariance function to address the afore¬ 
mentioned global scale geospatial issues. We define this generalized covariance 
function as an intrinsic non-stationary covariance function, because it uses intrin¬ 
sic statistics of the symmetric positive definite matrices to represent the charac¬ 
teristic length scale and, thereby, models the local stochastic process. Experi¬ 
ments on a synthetic and real dataset of relative sea level changes across the world 
demonstrate improvements in the error metrics for the regression estimates using 
our newly proposed approach. 

1 Introduction 

Covariance functions are a key element in the regression of geospatial data (kriging). Designing a 
covariance function that can capture the geospatial random field of natural processes is useful for 
understanding the changes occurring in climate related variables. Eor example, ongoing sea level 
changes provide an important context for understanding future coastal flood risks [1]. However, 
most of the climate related datasets at a global scale suffer from statistical issues of non-stationarity 
and non-uniformly smooth spatial boundaries. In this paper, we design a covariance function which 
addresses these issues in the datasets. 

Stochastic processes of complex systems at a global scale are known to have a regional geophysical 
variability. Eor example, the local sea-level changes occurring near Northern Europe are primarily 
dominated by the physics of the Glacial-isostatic adjustment, while the local sea-level changes 
occurring near Japan are dominated by the tectonics [2]. Such regional variations can be modeled 
using the non-stationary covariance function (Eigures l(a,b) depicts these issues). 

One of the important modeling issues in the complex geospatial dataset at a global scale is knowing 
the boundaries of the regional geophysical variability. Addressing this issue, which we call the non- 
uniformly smooth spatial boundary issue, aids in modeling the non-stationary covariance function by 
capturing the underlying correlation structure of a region. This paper addresses this issue for kriging. 
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(a) (b) 

Figure 1: Experimental datasets, (a) Synthetic data of the relative sea-level change from the Glacial- 
isostatic adjustment model, and (b) real tide gauge data of the rate of change of the sea level during 


the last 20 years across the world. 

For standard kriging models (GP), the methods given in [3] are widely used in many climate 
models, including models of the sea level data, because it gives a non-parametric method for 
analyzing such a geospatial random field. Such a model can be completely defined by its covariance 
function, and it learns the hyper-parameters of the designed covariance function directly from the 
training data without making any parametric assumptions about the data. 

To model the non-stationarity of the stochastic process, [4, 5, 6, 7] devised a non-stationary covari¬ 
ance function. These methods use spatially evolving smooth kernels to represent the characteristic 
length scale (CLS) of the covariance function, which, in turn, models the local correlation structure 
of the global scale stochastic process. From these methods, [4] is the most promising approach 
for climate data, as its model is entirely in the non-parametric GP framework. Even so, it uses a 
homogeneous set of hyper-parameters to ensure the smoothness of the CLS in the input space. We 
show in our experiments that using a homogeneous set of hyperparameters is inefficient for the 
global-scale complex systems. 

Non-stationarity has also been addressed by [8]. They use a multi-resolution basis function, fixed 
rectangular grid, and explicit thresholding scheme. The approach of setting fixed grids renders 
this method unviable for datasets with non-uniformly smooth boundaries. While [4] validates their 
model on precipitation estimates in Colorada (USA), [8] uses the daily ozone rate in the Midwest. 
Both of these datasets are locally dense, and, in turn, do not address the global scale climate data 
issues. 

In the literature of estimating global scale sea-level changes, techniques given by [9] use a Kalman 
filter model, and [2] use a mixture of GPs at various regional resolutions. However, these methods 
fail to tightly couple the local and global estimates in the spatial dimension. Additionally, these 
methods rely on the domain knowledge and a parametric framework. 

In this paper, we propose a covariance function to model the data and address the aforementioned 
issues at a global scale, and we call this covariance function the intrinsic non-stationary covariance 
function. We use intrinsic statistics [10] on the space of the CLS to capture the non-uniformly 
smooth spatial boundaries for the non-stationarity. Furthermore, we provide an algorithm for 
kriging using the proposed intrinsic covariance function, and we validate its applicability on two 
global scale datasets. 

2 Covariance functions used in climate modeling 

In this section, we provide relevant background about stationary and non-stationary covariance 
functions used in kriging, which helped in the development of our proposed framework of the 
intrinsic non-stationary covariance function. The standard kriging (GP) model tries to recover the 
underlying function, /, where yi ^ /(x^) y, from n observed data points The 

learned model is then used to compute the predictive distribution p(^*|x*, /) for a test point x*. 
Here, x G are points in the input space of dimension d, ^ G M are observed values, and y 
is observed noise. In our problem setting, x represents the 2D spatial coordinates (latitude and 
longitude) and y represents the rate of change of the sea level at x. 
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Assuming Gaussian noise r] ^ A/’(0, with a constant variance term the predictive distribu¬ 
tion is then given by A/’(/i, a^), where: li = k^{K^(7‘^I)~^y, + 

K(i^j) = k{xi, Xj), k^{i) = k{x^, Xi), k^^ = /c(x*, x*), y = (^i, • • • , and / is the identity 
matrix [3]. Here, Xj) is the covariance function which we are interested in modeling. 


2.1 Stationary covariance function 


For kriging, the correlation structure of the data is commonly modeled as a stationary covariance 
function when the underlying stochastic process of a function (/) is assumed to be stationary. A 
stationary isotropic covariance function that is motion and translation invariant can be defined as 

k{xi,Xj) = Here, (p : [0, oo) ^ M is a positive definite function, is the CLS, 

and d{xi,Xj) is the distance function (in geophysical measurements, this is commonly angular 
distance). As can be seen from the construction, when the CLS is large the correlation between the 
input space points is spatially small, and vice versa. In other words, the points that are far apart 
have a small effect on the inference step. 

When the CLS is differing with respect to the input space dimensions, d, the covariance function 
is called an anisotropic covariance function. One such form of this covariance function is given by 
k{xi,Xj) = (l){d{xi,Xj)'^{T^)~^d{xi,Xj)). Here, the CLS, S : ^ 5'+((i,]R), can be decomposed 

into S = T^DT. r is 3.d X k column vector and it gives the direction of relevance, D is 3.k x k 
diagonal matrix and it gives the magnitude of relevance, and k is the relevant number of axis-aligned 
dimensions. 

To date, various <p for have been proposed. For spatial statistics, the Matem covariance 

function is a standard choice due to its flexibility in capturing the varying smoothness of the 
underlying distribution: 


k{xi,Xj) = Ks{v, v/%) 


ar^2-X{u) 


( 1 ) 


where Qij = p is the smoothness hyperparameter that controls the differentiahility of 

the function, cr/ is the signal variance, Ky is the modified Bessel function, and r(z/) is the gamma 
distribution. 


2.2 Non-stationary covariance function 

The non-stationary covariance function can be modeled using a covariance function that varies its 
CLS with respect to the input point of interest. The non-parametric model form for such a co- 
variance function is given by k{xi,Xj) = (j){Qij), Qij = d^Xi^XjY")~^d{xi^Xj), and 
T^i := Tj{xi) = T^Xi)"^D{xi)T{xi). For this non-stationary covariance function to be valid, [6] 
shows (using a convolution of kernels) that S(x) : ^ 5'+((i, M) should be spatially evolving 

smooth kernels of symmetric positive definite matrices. 

A closed-form solution of the nonstationary Matern covariance function is given by [4] as: 


k{xi,Xj) = KNs{xi,Xj) 


|Ei^|2 


y Qij)' 


( 2 ) 


Intuitively, Equation (2) indicates that the covariance between two observed values (yi) is affected 
by the convolution (arithmetic average) of the local covariates (kernels) at the input locations (xi). 
Consequently, this results in a variable CLS at the target values (y*). 


2.3 Spatially evolving kernels for the characteristic length scales 

For modeling the CLS, [6] uses a predetermined area of the ellipse, [11] uses B-splines, and [4] 
uses a stationary GP. All of these methods capture the underlying local correlation of the data using 
N neighbors of Xi . In this paper, we are interested in models that ensure a locally smooth manifold 
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Figure 2: Proposed framework for the intrinsic non-stationary covariance function. 


of T^i, which lies in the space of positive definite matrices M) in a non-parameteric fashion, 

while still being entirely in a GP framework. [4] proposes the following eigen-decomposition of 
for the spatial data: 


r(x,) = 


,D{xi) = 


log(Ai) 

0 


0 

log(A2) 


5 ^UV - . 


(3) 


The eigen decomposition parameters {u, Ai}i are each, in turn, modeled using an independent sta¬ 
tionary GP in the latitude dimension of Xi, and {v, X 2 }i using an independent stationary GP in the 
longitude dimension of Xi. All four independent GPs have their own sets of hyperparamters. 


3 Intrinsic non-stationary covariance function 

In this section, we first propose our model of the covariance function (intrinsic non-stationary co- 
variance function), then give background for the intrinsic statistics that we employ to construct the 
intrinsic non-stationary covariance, and finally provide an algorithm for its implementation in krig- 
ing. 

In order to improve the model of the non-stationary covariance function that captures the variable 
regional information, while still maintaining a non-parametric model, we propose a new class of 
covariance functions and define this class as an intrinsic non-stationary covariance function. 

Definition 1. An intrinsic non-stationary covariance function assumes the form k{xi^ Xj) = 
HQij)’ ^here Qij = d{xi, Xj)^{f>ij)~^d{xi, Xj) and ifij := Sj). 

Here, f) : M) ^ 5'+(d, M) is an objective function of the form: 

:= argmin (4) 

where S is the CLS for the intrinsic non-stationary covariance function, is the intrinsic 

distance metric, M is the neighbors (including itself) of the geospatial point of interest i^j, and 
d{xi^Xj) is the distance in the input space. 

In our study, the aim of the objective function '0(-, •) is to model the regional geophysical CLS of 
the underlying stochastic process. Intuitively, S represents the intrinsic mean of the CLS at the 
input points Xj}, and it incorporates the regional CLS in its convolution step for the intrinsic 
covariance function. 

There are two important factors in modeling this function : 

1. Finding the neighborhood points that represent the regional information. 

2. Finding a representative of the CLS that describes the statistics (up to second-order) for the region 
of interest. 

Figure 2 depicts these two factors, and Section 3.2 gives one such method to construct these classes 
of functions. 
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Theorem 1. The intrinsic non-stationary covariance function, as defined above, is a valid non¬ 
stationary covariance function. 


Proof [6] shows that the construction of a covariance function using a moving average specification 
(i.e., convolution of kernels) leads to a non-stationary process. In our construction (Equation 4) we 
specify the convolution of kernels on a smooth manifold, preserving the properties of symmetric 
positive definite matrices, and, in turn, obtaining a non-stationary process definition of [6] with a 
positive definiteness of the covariance function. □ 

3.1 Intrinsic Statistics for the characteristic length scales 

To compute the objective function for the space of spatially smooth kernels we assume 

that lies on the Riemannian manifold A4 of positive definite matrices S~^(d, M). The Riemannian 
manifold is assumed to be geodesically complete and endowed with a canonical affine connection. 
[12] and [13] derived explicit forms of the geodesic distance on this manifold as: d(Ei,Ej) = 

\J\ where rn denotes the d eigenvalues of the matrix G S~^. 

[14] defined the empirical Riemannian mean S G S~^{d, M), and [15] gave an iterative form of S as 
a local minimum of the objective function : S~^{d, M) ^ IR+: 

1 ^ 

A2(Ei,...,I]iv) = ^y]ci2(Efe,E). (5) 

^ k=l 

Each of the N normal distributions p(. |E/c) is associated with a unique tangent vector [3k G S{d, M), 
such that S is mapped onto T^k by an exponential map ex.pj^{Pk) = 

The covariance matrix of a set of covariance matrices itself (i.e., in our model this is a covariance of 
a set of the CLS) is then defined as: 


1 ^ 

_1 El (6) 

fc=l 

The covariance of the set of the CLS gives us a measure by which to evaluate the properties of its 
distribution on a manifold. Eor example, when the tr (A) is small, it means that the set of covariances 
(CLS) are from the same normal distribution A/’(E|E, A) and are highly correlated. 

Note the similarity in the variable N of (5) and G ff} of (4), which shows that fi)ij is in fact 

the intrinsic Riemannian mean E. 

3.2 Algorithm for the intrinsic non-stationary covariance function 


Algorithm 1 Kriging for the Intrinsic Non-stationary Covariance Eunction 

1: Input: Training set {(x, Test points 

2: Output: Predictive densities p(^* |x*). 

3: Initialize estimates of E^ using Equation (3). 

4: Eind the characteristic length scale using Algorithm 2. 

5: Compute the intrinsic nonstationary covariance function using Equation (2). 
6: Compute the GP using p(^*|:r*, /) ^ A/’(/i, a). 


Algorithm 1 describes our framework for implementing the intrinsic non-stationary covariance 
function, and Eigure 2 depicts this general framework. We first obtain initial smooth estimates 
of the CLS (E^) by Equation (3), and then we update E^ with the aim of modeling the intrinsic 
function fi>ij using our approach proposed in Algorithm 2. Our method maintains the appropriate 
smoothness in the latent space of E^ due to the second-order intrinsic statistics on the Riemannian 
manifold. Additionally, our method captures the correlation of the E^ in its intrinsic latent space that 
the initial estimate failed to capture. Hence, the estimates for the CLS around sharp discontinuities 
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Algorithm 2 Intrinsic Characteristic Length Scale 

1: Input: {xi,Xj}, {xk,T.k]k=i. 

2 : Output: 'ipij. 

3: Find K nearest neighbors A/^ for Xi and JVj for Xj in the input space. 
4: Remove the S^’s from , such that, |tr(A^)| < threshold. 

5: Find using for Eq. 5. 

6: Remove the S^’s from , such that, \tr{Aj)\ < threshold. 

7: Find Sj using ) for Eq. 5. 

8: Find 2 pij using A^(Si, Sj) 


and separated regions of the spatial data field are improved, as shown in Figure 3. Finally, the CLS 
i'lpij) is used in Equation (2) for the GP regression model. 

For step (3) of Algorithm 1, [4] bound the to achieve the required smoothness and used 

the arithmetic mean to convolve and Sj for its covariance function. We used the empirical 
Riemannian mean rather than the arithmetic mean and, therefore, are free from fixing the bounds 
on the initial estimates of This allows the variable smoothness in Equation (2) to 

naturally evolve in the space of symmetric positive definite matrices. Note, the thresholds of 
variance are bounded in the space of and are not directly dependent on the input space. 

Additionally, [16] shows that the arithmetic mean causes larger determinants (than the original 
determinant) in the space of M), which the Riemannian mean avoids. 

Algorithm 2 uses two levels of nearness measures: 1) the usual distance metric directly on the input 
space that measures spatial proximity, and 2) the intrinsic statistics of that measure its proximity 
on the manifold. Here, we used \tr{Ai)\ to measure the correlation of the neighboring S^’s for its 
simplicity, but one could also use the K-nearest neighbors in the space of S^s. It would be worth 
exploring how the different statistical measures on the manifold of the CLS space could improve 
the kriging estimates when sharp jumps exist in the underlying distribution. 

For example, a CLS computation of a geospatial location that is close to the boundary of its geophys¬ 
ical region is less reliable when it is the function of its input space alone (i.e., in Algorithm 1, Step 
3). However, when the additional information of the neighboring CLS is incorporated into the model 
(i.e.. Algorithm 2, Step 4 and 5), one could potentially recover the CLS of the boundary points that is 
closer (Equation 5) to the CLS representative of its associated region. We show in the experimental 
section that this approach of Algorithm 2 is particularly useful for analyzing climate models. 

For the initialization of the CLS in Algorithm 1, Step (3), one can use the numerical implementation 
of either [4], [6], or [11]. In our experiments, we used [4], because the empirical testing showed 
that it gave the best results for our application. Similarly, we used the numerical implementation 
of [13] for the intrinsic statistics (Equations (5) and (6)). 

The K for the nearest neighbors and the threshold for the variance in Alg. 2 are found using the 
5-fold Cross Validation. To find the GP hyperparameters, for step (1) and step (5) in Alg. 1, we use 
a Markov Chain Monte Carlo (MCMC) sampling scheme that is similar to the one outlined in [5]. 

4 Experiments on climate related data 

To gain insight into the applicability of our proposed covariance function, we implemented and 
compared kriging with three different covariance functions: the widely used stationary anisotropic 
Matern covariance function (statGP) of [17], the baseline non-stationary covariance function 
(NSGP) of [4], and the intrinsic non-stationary covariance function (iNSGP) that we propose in 
this paper. These methods were evaluated using two standard performance measures (as described 
below) for kriging. The three datasets used are: the smooth 2d simulated dataset (SIM) given in 
[4], the geophysics driven synthetic^ data (CIA) as modeled in [18], and the global-scale complex 
naturally occurring real^ dataset (TG) collected in [19]. 

^http://www.psmsl.org/train_and_info/geo_signals/gia/peltier/ 

^http://www.psmsl.org/data/obtaining/ 
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Table 1: Evaluations of the simulated (SIM), synthetic (GIA), and real (TG) datasets. GIA and TG 
dataset results are in the units of mm/year. 


Methods 

SIM 

GIA (All) 

GIA (Reg.l) 

TG (All) 


sMSE 

nLPD 

sMSE 

nLPD 

sMSE 

nLPD 

sMSE 

nLPD 

StatGP 

0.0240 

0.311 

0.58 

3.08 

1.57 

19.23 

0.85 

2.81 

NSGP 

0.0237 

0.278 

0.56 

2.00 

1.24 

7.30 

0.75 

2.82 

iNSGP 

0.0235 

0.271 

0.54 

1.94 

1.03 

6.90 

0.71 

2.78 


The Simulated Dataset. For this study, we are interested in comparing our method with the 2d sim¬ 
ulated functions that have been previously used in the non-stationary covariance function literature. 
The experimental set up is given in [4]. Even though the simulation function is non-stationary, it 
is fairly smooth and homogeneous, and it lacks the complex regional geophysics that is usually 
encountered in global-scale climate related data. 

The Synthetic Dataset. For this study, we are interested in modeling the global-scale geophysical 
signal that is present in the climate datasets (such as tide gauge data). This synthetic dataset enables 
us to compare the three covariance functions (statGP, NSGP, iNSGP) with known geophysics 
boundaries. 

One such widely modeled geophysical signal is the Glacial-isostatic adjustment (GIA) model. We 
used the signal modeled in GIA [18], which gives the difference in the height between the sea 
surface and solid earth. Figure 1(a) shows the distribution of the underlying distribution {y) after 
masking out the land mass. To examine the robustness of our proposed model, the experimental 
setup includes 25 independent runs of the GIA data. For each run, we randomly sampled 315 
training points, added Gaussian noise {rj = 0.2) to sea level measurements, and tested on 946 
independent random sampled points. 

The Real Dataset. For this study, we are interested in a global-scale climate variable that contributes 
to future climate related risks, is known to have non-stationarity, and has complex regional geo¬ 
physics in its dataset. Hence, we used tide gauge sites measuring relative sea level measurements 
across the coastal areas of the globe (see Figure 1(b)). From the dataset given in [19], 747 locations 
were selected because they had consistent temporal records from the years 1993 to 2012. To focus 
our study on the geo-spatial set up, we used 747 locations of tide gauge sites to construct the rate of 
change of the sea level (mm/year), where the annual sea level rate of change was obtained from linear 
regression estimates in the temporal dimension. The experimental set up then includes 25 indepen¬ 
dent runs with randomly sampled 374 and 373 locations for respectively the training and test set. 

Evaluation metrics. We report performance with respect to two widely adopted metrics in 
kriging: the standardized mean squared error (sMSE) and the negative log predictive density 
(nLPD). sMSE measures the point estimate errors in the predictions and is given by: sMSE = 
n~^ ~ yi)‘^^ where n is the number of test points, yl is the predictive mean at 

input space Xi, and var(^) is the sample variance. nLPD measures not just the point estimates error, 
but also the error variance of the predictions and is given by: nLPD = —n~^ 

5 Results 


Table 1 summarizes the performance for the three covariance functions (statGP, NSGP, and iNSGP) 
when implemented on the three datasets (SIM, GIA, and TG). The performance of iNSGP is partic¬ 
ularly improved for the GIA and TG datasets, while it does not show much of an improvement over 
NSGP for the SIM dataset. This is mainly because the SIM dataset is fairly smooth. Furthermore, 
the SIM data does not suffer from the issue of a non-uniformly smooth spatial boundary (regional 
geophysics) that is present in GIA and TG datasets Fig 1(a). 

For example. Fig 3(a) shows the true values of GIA data near the Barent sea (Reg.l), which is 
a marginal sea of the Arctic ocean. The CLS estimates for this region (which is parametrically 
plotted as ellipses) from statGP, NSGP, and iNSGP are shown in Fig 3(b,c,d). Note the differences 
in the shape of the ellipses numbered (1,2,3) in Fig 3(b,c,d). These points (1,2,3) correspond to 
the geospatial boundary points of the regions (1,2,3) in Fig 3(a). While statGP has the same shape 
for all of the points, NSGP shows some variation in its shape. Even so, they are largely similar. 
On the other hand, iNSGP has a distinct variation in the the points (1,2,3) and well represents the 
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Figure 3: GIA dataset around Barrent sea (Reg.l) (a) GIA data (true values), (b) Stationary GP 
(statGP) characteristic length scale (CLS), (c) Non-stationary GP (NSGP) CLS, and (d) Intrinsic 
non-stationary GP (iNSGP) CLS. 

differences in these three regions. This is explained by the superior performance (Tab. 1) of iNSGP 
over NSGP and statGP (particularly in the GIA Reg.l). 

Table 1, shows the average values for the error metrics. The maximum standard deviation in 
sMSE for SIM was 0.003, for GIA was 0.02, and for TG was 0.5. For each of these runs, iNSGP 
performed as well as NSGP for the SIM dataset, and outperformed NSGP and statGP for the TG 
and GIA datasets. The error values for the SIM dataset when applying the methods of StatGP and 
NSGP were similar to the error values found in [4]. 

From all three datasets, the real dataset (TG) shows the most improvement in its error metric when 
applying the iNSGP method. The high improvement is because TG has large variability in the 
regional geophysics; therefore, the intrinsic non-stationary covariance function is able to better 
model the underlying true distribution than the stationary and non-stationary covariance functions. 


6 Discussion and Concluding Remarks 

We introduced a new class of covariance functions, which we call an intrinsic non-stationary 
covariance function. This covariance function is especially useful in modeling global scale 
geospatial datasets that have a non-stationary process and non-uniformly smooth spatial boundaries 
due to regional geophysics. We developed a framework to apply this covariance function for 
kriging. Using the sea level dataset from the Glacial-isostatic adjustment model and tide gauge 
measurements, we demonstrated our framework’s improved performance in kriging when compared 
with the non-stationary covariance function. 

There are many facets of implementation of the intrinsic non-stationary covariance function that 
could be undertaken in the future. One of the important issues, especially outside of the geo-spatial 
community, is large datasets. Specifically, one can use sparse regression techniques [20] and a 
computationally cost effective metric for the intrinsic statistics on the positive definite matrices of 
the characteristic length scale [16] to deal with large datasets. 
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Other climate related variables, such as temperature and precipitation records, face similar mod¬ 
eling issues as the sea level dataset that we used for our application. Future work will explore the 

application and methods of our intrinsic covariance function to such geospatial datasets, with the 

larger goal of aiding in the assessment of future climate related risks. 
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